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Abstract 

We generalize the exponential family of probability distributions E p . In our 
approach, the exponential function is replaced by the ^-function, resulting in 
the <p- family of probability distributions Tf. We provide how (^-families are 
constructed. In the (/3-family, the analogous of the cumulant-generating func- 
tional is a normalizing function. We define the (^-divergence as the Bregman 
divergence associated to the normalizing function, providing a generalization of 
the Kullback-Leibler divergence. We found that the Kaniadakis' K-exponential 
function satisfies the definition of (^-functions. A formula for the ^-divergence 
where the ^-function is the K-exponential function is derived. 

Keywords: Exponential family of probability distributions, Musielak-Orlicz 
spaces, Bregman divergence 



1. Introduction 

Let (T, E, jLi) be a <7-finite, non-atomic measure space. We denote by = 
V(T,T,, fi) the family of all probability measures on T that are equivalent to 
the measure \i. The probability family can be represented as (we adopt the 
same symbol Vn for this representation) 

V u = {p e L° : p> and E[p] = 1}, 

where L° is the linear space of all real-valued, measurable functions on T, with 
equality /i-a.e., and E[-] denotes the expectation with respect to the measure p.. 

The family can be equipped with a structure of C°°-Banach manifold, 
using the Orlicz space L* 1 (p) = L* 1 (T, E, p- /i) associated to the Orlicz function 
&i(u) = exp(w) — 1, for u > 0. With this structure, is called the exponential 
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statistical manifold., whose construction was proposed in [l| and developed in 
@> S 0| • Each connected component of the exponential statistical manifold gives 
rise to an exponential family of probability distributions £ p (for each p G V^}. 
Each element of £ p can be expressed as 

e p (u) = e u ~ K " {u) p, for u G B p , (1) 

for a subset B p of the Orlicz space L^ 1 {p). K p is the cumulant-gencrating 
functional K p (u) = logEp[e"], where E p [-] is the expectation with respect to 
p ■ fi. If c is a measurable function such that p — e c , then ([1} can be rewritten 
as 

e p (u) = e c+"-x,(v)+T ^ for u e Bp ^ ( 2 ) 

where 1a is the indicator function of a subset A C T . 

In the tp-family of probability distributions Tf , which we propose, the expo- 
nential function is replaced by the so called ip-function ip: Txl-> [0, oo]. The 
function (pit, •) has a "shape" which is similar to that of an exponential function, 
with an arbitrary rate of increasing. For example, we found that the K-expo- 
nential function satisfies the definition of (^-functions. As in the exponential 
family, the (^-families are the connected component of V^, which is endowed 
with a structure of C°°-Banach manifold, using ip in the place of an exponential 
function. Let c be any measurable function such that tp(t, c(t)) belongs to Vu- 
The elements of the tp-family of probability distributions IFf are given by 

cp c (u)(t)=<p(t,c(t)+u(t)-ip(u)u (t)), iorueB^, (3) 

for a subset Bf of a Musielak-Orlicz space L|f. The normalizing function 
ip : Bf — > [0, oo) and the measurable function uq : T — > [0, oo) in Q replaces 
K p and It in ([2]), receptively. The function uq is not arbitrary. In the text, we 
will show how uq can be chosen. 

We define the (p-divergence as the a Bregman divergence associated to the 
normalizing function ip, providing a generalization of the Kullback-Leibler di- 
vergence. Then geometrical aspects related to the (^-family can be developed, 
since the Fisher information (from which the Information Geometry 0, [y| is 
based) is derived from the divergence. A formula for the ^-divergence where 
the ^-function is the K-exponential function is derived, which we called the 
K-divergence. 

We expect that an extension of our work will provide advances in other ar- 
eas, like in Information Geometry or in the non-parametric, non-commutative 
setting The rest of this paper is organized as follows. Section [2] deals with 

the topics of Musielak-Orlicz spaces we will use in the the construction of the 
(^-family of probability distributions. In Section [3l the exponential statistical 
manifold is reviewed. The construction of the tp-family of probability distri- 
butions is given in Section 01 Finally, the (^-divergence is derived in Section 
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2. Musielak Orlicz spaces 

In this section we provide a brief introduction to Musielak-Orlicz (function) 
spaces, which are used in the construction of the exponential and (p- families. A 
more detailed exposition about these spaces can be found in [ToL [lH ■ 

We say that $:Tx [0, oo] — > [0, oo] is a Musielak-Orlicz function when, for 
/x-a.e. t G T, 

(i) <I>(i, •) is convex and lower semi-continuous, 

(ii) $(f, 0) = lim u ;o &(t, u) — and $(f , oo) = oo, 

(hi) $(-,w) is measurable for all u > 0. 

Items (i)-(h) guarantee that $(£, •) is not equal to or oo on the interval (0, oo). 
A Musielak-Orlicz function $ is said to be an Orlicz function if the functions 
$(t, •) are identical for fi-a.e. t G T. 

Define the functional = J T $(t, \u(t)\)dfi, for any u € L°. The Musielak- 

Orlicz space, Musielak-Orlicz class, and Morse-Transue space, are given by 

L* = {u e i° : J$(Att) < oo for some A > 0}, 
L*={uel°: J$(u) < oo}, 



and 



= {u G £° : J$(Au) < oo for all A > 0}, 



respectively. If the underlying measure space (T, £,/i) have to be specified, we 
write L*(T, S, /x), 1*(T, S, (i) and B*(T, S,/x) in the place of L*, L* and E*, 
respectively. Clearly, _E* C L* C L*. The Musielak-Orlicz space L* can be 
interpreted as the smallest vector subspace of L° that contains Z*, and E® is 
the largest vector subspace of L° that is contained in Z *. 

The Musielak-Orlicz space L* is a Banach space when it is endowed with 
the Luxemburg norm 



= inf{A>0:/$(^) < l}, 



or the Orlicz 



|«||*,o = SU P 



uvdfi 



uei* and < 1 }, 



where $>*(t,v) = sup u>0 (uv — &(t,u)) is the Fenchel conjugate of $(t, •). These 
norms are equivalent and the inequalities ||u||$ < ||u||$o ^ 2||u||$ hold for all 
u G L*. 

If we can find a non-negative function / G Z* and a constant K > such 
that 

$(*, 2u) < u), for all u > f(t), 
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then we say that <E> satisfies the ^-condition, or belong to the A2 -class (denoted 
by $ G A 2 ). When the Musielak-Orlicz function $ satisfies the A2-condition, 
£?* coincides with L*. On the other hand, if $ is finite- valued and does not 
satisfy the A2-condition, then the Musielak-Orlicz class L* is not open and its 
interior coincides with 

B Q {Ef, 1) = {11 £ i* : inf ||u - w||$,o < 1}, 
or, equivalently, B (E* , 1) £ Z* £ B (-E*, !)• 

3. The exponential statistical manifold 

This section starts with the definition of a C fc -Banach manifold. A C k -Ba- 
nach manifold is a set M and a collection of pairs (U a ,x a ) (a belonging to 
some indexing set), composed by open subsets U a of some Banach space X a , 
and injective mappings x a : U a — > A'/, satisfying the following conditions: 

(bml) the sets x a (U a ) cover M, i.e., (J Q x a (f/ a ) = M; 

(bm2) for any pair of indices a,/3 such that x a (U a ) (1 xp{Up) — W ^ 0, the 
sets x~ 1 (W) and jc^^W) are open in X a and X^, respectively; and 

(bm3) the transition map i^o^: x~ 1 (W) — > is a C fc -isomorphism. 

The pair (C/ a ,x a ) with p e x a (U a ) is called a parametrization (or system of 
coordinates) of M at p; and £C q (J7q.) is said to be a coordinate neighborhood at 
P- 

The set M can be endowed with a topology in a unique way such that each 
x a (U a ) is open, and the £C a 's are topological isomorphisms. We note that if 
k > 1 and two parametrizations (U a ,x a ) and (Up,xp) are such that 
and xp{Up) have a non-empty intersection, then from the derivative of x^ 1 o x a 
we have that X a and Xp are isomorphic. 

Two collections {(U a ,x a )} and {(Vp,xp)} satisfying (bml)-(bm3) are said 
to be C k -compatible if their union also satisfies (bml)-(bm3). It can be verified 
that the relation of C fc -compatibility is an equivalence relation. An equivalence 
class of C fc -compatible collections {(U a ,x a )} on M is said to define a ^-dif- 
ferentiate structure on X. 

Now we review the construction of the exponential statistical manifold. We 
consider the Musielak-Orlicz space L' s ' 1 (p) = L $1 (T, E,p • p), where the Orlicz 
function $1 : [0, 00) — > [0, 00) is given by &i(u) = e" — 1, and p is a probability 
density in 7-^. The space L $1 (p) corresponds to the set of all functions u G L° 
whose moment-generating function u p (t) — E p [e* u ] is finite in a neighborhood 
of 0. 

For every function u e L° we define the moment-generating functional 

M p (u)=E p [e u ], 
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and the cumulant- generating functional 



K p (u) = logM p (u). 

Clearly, these functionals are not expected to be finite for every u G L°. De- 
note by K p the interior of the set of all functions u G L* 1 (p) whose moment- 
generating functional M p (u) is finite. Equivalently, a function u G L* 1 ^) be- 
longs to 1C P if and only if M p (Xu) is finite for every A in some neighborhood of 
[0, 1]. The closed subspace of p- centered random variables 

B p = {ue L**(p) :K p [u] =0} 

is taken to be the coordinate Banach space. The exponential parametrization 
e p : B p — > E v maps B p = B p n JC P to the exponential family £ p = e p (B p ) C V p , 
according to 

e p (u) = e u ^ K " (u) p, for all u G B p . 

e p is a bijection from B p to its image £ p = e p (B p ), whose inverse e" 1 : E p — > B p 
can be expressed as 



(q) = log(-) - E p log(-) , for q G £ p 



Since K p (u) < oo for every u G IC P , we have that e p can be extended to TC p . 
The restriction of e p to B p guarantees that e p is bijective. 

Given two probability densities p and q in the same connected component of 
■p^t, the exponential probability families £ p and £p coincide, and the exponential 
spaces L $1 (p) and are isomorphic (see [2j, Proposition 5]). Hence, B p = 

e~ x (£ p n £ q ) and B q = e~ 1 (£ p n £ q ). The transition map e^ 1 o e p : B p — s- 
which can be written as 

° e p( w ) = u + log(-) - E <? u + log^-J , for all u G B p , 

is a C°° -function. Clearly, Upe^ e p(^p) — Thus the collection {(B p , e p )} pe -p^ 
satisfies (bml)-(bm2). Hence is a C°°-Banach manifold, which is called the 
exponential statistical manifold. 



4. Construction of the ^-family of probability distributions 

The generalization of the exponential family is based on the replacement of 
the exponential function by a ip-function ip: T x ffi. — > [0,oo] that satisfies the 
following properties, for /i-a.e. (eT: 

(al) (p(t, ■) is convex and injective, 

(a2) tp(t, — oo) = and ip(t, oo) = oo, 

(a3) ip(-,u) is measurable for all u G M. 



In addition, we assume a positive, measurable function uq : T — » (0, oo) can be 
found such that, for every measurable function c: T — > R for which <p(t,c(t)) is 
in V^, we have that 

(a4) cp(t, c(t) + Xuo(t)) is ^-integrable for all A > 0. 

The choice for ip(t, •) injective with image [0, oo] is justified by the fact that a 
parametrization of maps real-valued functions to positive functions. More- 
over, by (al), tp(t, ■) is continuous and strictly increasing. From (a3), the func- 
tion ip(t,u(t)) is measurable if and only if u: T — > R is measurable. Replacing 
ip(t,u) by ip(t,uo(t)u), a "new" function uq = 1 is obtained satisfying (a4). 

Example 1 

(El, E3> El)- The Kamadakis' k- exponential exp K : R — > (0, oo) 
for K € [— 1, 1] is defined as 



Some algebraic properties of the ordinary exponential and logarithm functions 
are preserved: 

exp K (u) exp K (-it) = 1, ln K (u) + ln K (u _1 ) = 0. 

For a measurable function n: T — > [—1, 1], we define the variable n- exponential 
exp K : T x R ->• (0, oo) as 

exp K (t,u) = exp K(t) (u), 

whose inverse is called the variable n-logarithm: 

ln K (i,u) = ln K(t) (u). 

Assuming that k_ = essinf|fi;(t)| > 0, the variable K-exponential exp K satis- 
fies (al)-(a4). The verification of (al)-(a3) is easy. Moreover, we notice that 
exp K (i, •) is strictly convex. We can write for a > 1 



< a 1/M (K(t) U + + K (t) 2 U 2 ) 1/K 

< a 1/K - exp K (t,u). 

By the convexity of exp K (t, •), we obtain for any A € (0, 1) 

cxp K (t,c + u) < Aexp K (i, A _1 c) + (1 - A)exp K (i, (1 - A) _1 w) 
< A 1 " 1 /-- exp K (t, c) + (1 - A) 1 - 1 /*- exp K (t, u). 
Thus any positive function uq such that E[exp K (uo)] < oo satisfies (a4) 




The inverse of exp K is the Kaniadakis' n-logarithm 




exp K (i, au) = (n{t)au + a\Jl/a 2 + ^(i) 2 !* 2 ) 1 



n 
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Let c: T — > R be a measurable function such that (p(t,c(t)) is /i-integrable. 
We define the Musielak-Orlicz function 

$(t,u)=<p(t,c{t)+u)-<p(t,c(t)). 

and denote L*, Z* and by an d Eg, respectively. Since ip(t,c(t)) is 

/i-integrable, the Musielak-Orlicz space corresponds to the set of all functions 
u £ L° for which ip(t, c(t) + Xu(t)) is /i-integrable for every A contained in some 
neighborhood of 0. By the convexity of tp(t, •), we have 

utp'(t, c(t)) < ip(t, c(t) + u) - ip(t, c(t)), for all uei (4) 

Hence every function u in L|f belongs to the weighted Lebesgue space £i,(/i) 
where w(t) = ip'(t,c(t)). 

Let /CJ? be the set of all functions u £ L|f such that (p(t, c(t) + Xu(t)) is /i-in- 
tegrable for every A in a neighborhood of [0, 1]. Denote by ip the operator acting 
on the set of real-valued functions u: T — > R given by (p(u)(t) — ip(t, u(t)). For 
each probability density p £ "P M , we can take a measurable function c: T — > R 
such that p — ip(c). The first import result in the construction of the ^-family 
is given below. 

Lemma 2. The set K% is open in L^. 

Proof. Take any u £ Kf!. We can find e £ (0, 1) such that E[y>(c + au)\ < oo 
for every a £ [—e, l + e]. Let 5 — [|(1 + e)(l + f)] -1 . For any function v £ L% 
in the open ball B s = {w £ L% : ||w||$ < 5}. we have i*(f) < 1. Thus 
E{<p(c + j\v\)\ < 2. Taking any a £ (0, 1 + §), we denote A = In virtue of 

a a 1 + 1 2,„ e. 1 

T-A^i-^^ 1 -iTr^^ 1+£)(1 + 2 ) -^ 

!+ £ 1 l+e 

it follows that 

tp(c + a(u + v)) = v (A(c + f u) + (1 - A)(c + ^t;)) 

< A V ( C +f U ) + (l-A) V ( C + T ^i;) 

< Acp(c + (1 + e)«) + (1 - X)tp{c + \\v\). (5) 

For a £ (— |, 0), we can write 

(p(c + a(u + v)) < 5<p(c + 2cm) + \<p{c + 2av) 

< ±<p(c + 2au) + ±<p(c+\v\). (6) 

By © and ©, we get E[<p(c + a(u + v))] < oo, for any a £ (— 1, 1 + |). Hence 
the ball of radius S centered at u is contained in /CJf . Therefore, the set Kg is 
open. □ 
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Clearly, for u G K% the function tp{c + u) is not necessarily in V^. The 
normalizing junction ip : /CJ? — > M is introduced in order to make the density 

(f(c + u — ip(u)u ) 

contained in V^, for any u G JCf. We have to find the functions for which 
the normalizing function there exists. For a function u G LJf, suppose that 
tp(c + u — aug) is /i-integrable for some ael. Then u is in the closure of the 
set Kf. Indeed, for any A <E (0, 1), 

ip(c + Am) = ¥>(A(c + u — au ) + (1 - A)(c + j^au )) 
< \<p(c + u — au ) + (1 - A)y>(c + j^au ). 

Since the function uq satisfies (a4), we obtain that ip(c + Am) is /i-integrable. 
Hence the maximal, open domain of ip is contained in /CJ? . 

Proposition 3. // the function u is in , then there exists a unique ip(u) G M 
for which ip{c + u — ip(u)uo) is a probability density in V^. 

Proof. We will show that if the function u is in /C|f , then cp(c + u + auo) is 
/i-integrable for every a G K. Since u is in JCf : we can find e > such that 
(p(c + (1 + e)u) is /i-integrable. Taking A = jt^, we can write 

ip(c + u + au ) = <p(A(c + ju) + (1 — A)(c+ j^au )) 
< \<p(c + iu) + (1 - \)<p(c + -^jauo). 

Thus y(c + u + auo) is /i-integrable. By the Dominated Convergence Theorem, 
the map a i— > J(a) = E[y(c + u + a«o)] is continuous, tends to as a — > — oo, 
and goes to infinity as a — > oo. Since (p(t, •) is strictly increasing, it follows that 
J(a) is also strictly increasing. Therefore, there exists a unique tp(u) G M. for 
which ip(c + u — ip(u)uo) is a probability density in V^. □ 

The function tp : /C|? — ¥ R can take both positive and negative values. How- 
ever, if the domain of ip is restricted to a subspace of LJ?, its image will be 
contained in [0, oo). Denote the closed subspace 

Bt = {u G LI : E[V(c)] - 0}, 

and let B% = Bf n Kf. Supposing that u G Bf, it follows that E[u<p'(c)] = 
and E[<p(c + u)] < oo; and, according to inequality (U), we have 

1 = E[wp'(c)] + E[(p(c)} < E[cp(c + u)} < oo. 

If u G Kf belongs to the subspace Bf , the integral of cp(c + u) is greater than 
or equal to 1. Subtracting -0(u)uo, the integral decreases to 1, and we obtain 
that ip(c + u — ip(u)uo) is in V^. 

For each measurable function c: T — > M such that the probability density 
p = tp(c) belongs to Vp, we associate a parametrization cp c : Bf — > Tf that 



maps each function u in Bf to a probability density in Tf — (f c (Bf) C 
according to 

tp c (u) = ip(c + u- iJj(u)u ). 

Clearly, we have — : y(c) £ V^\. Moreover, the map <p c is a bijection 

from Bf to Ff . If the functions u,v £ Bf are such that <p c (u) = <p c (v), then 
the difference u — v — (ip(u) — ip(v))uo is in Bf . Consequently, ip(u) — ijj(v) 
and then u — v. 

Suppose that the measurable functions c\ , c 2 : T — > R are such that p\ — 
y(ci) and p 2 — (p{c 2 ) belong to Vp. The parametrizations <p Ci : Bf — > J 7 ^ and 
<p C2 : Bf 2 —> Tf 2 related to these functions have transition map 

Let V'l : S)f — » R and ^2 : Sc 2 — >■ R be the normalizing functions associated to 
c\ and c 2 , respectively. Assume that the functions u £ Bf and v £ Bf 2 are such 
that (p Cl (u) = ip C2 (v) £ Tf x n Tf^ . Then we can write 

v = ci - c 2 + u - (ipi(u) - ip 2 (v))u . 

Since the function v is in Bf 2 , if we multiply this equation by (p (c 2 ) and integrate 
with respect to the measure fi, we obtain 

= E[(ci -c 2 + u)<p'(c 2 )} - (ipi(u) - M*)) E[W(e 2 )]. 
Thus the transition map ipf^ o ip Ci can be expressed as 

— 1 / \ . E[(ci - c 2 + w)ip'(c 2 )} , . 

¥>c 2 V C1 (w) = ci - c 2 + w =| j—- u Q , (7) 

for every w £ <PZ^{Tf fl -F^)- Clearly, this transition map will be of class C°° 
if we show that the functions w and c\ — c 2 are in Lf 2 , and the spaces Lf and 
Lf have equivalent norms. It is not hard to verify that if two Musielak-Orlicz 
spaces are equal as sets, then their norms are equivalent (see 0, Theorem 8.5]). 
We make use of the following: 

Proposition 4. Assume that the measurable functions c,c: T — > R satisfy 
E[ip(t,c(t))] < 00 andE[ip(t,c(t))} < 00. Then L^ C Lf if and only if c- c £ Lf. 

Proof. Suppose that c - c is not in Lf. Let A = {t £ T : c(t) < c(t)}. For 
A £ [0, 1], we have 

E[<p(c + A(c - c))] = E[<p(c + A(c - c))l nA ] + E[ V (c + A(c - c))l A ] 

< E[<p(c + (c - c))l TV4 ] + E[ V (c)1a] 

< E[<p(c)] +E[y>(c)] < 00. 

Since c~ c ^ Lf, for any A > 0, there holds E[<p(c — A(c — c))] = 00. From 

E[¥>(c - A(c - c))] = E[<p(c - A(c - c))l nA ] + E[<p(c - A(c - c))l A ] 
<E[<p(c + \(c-c))l A }, 



9 



we obtain that (c — c)1a does not belong to Lf. Clearly, (c — c)1a £ L~. 
Consequently, L~ is not contained in Lf. 

Conversely, assume c — c £ L|f. Let u; be any function in L~. We can find 
e > such that E[<p(c + An;)] < oo, for every A G (— £, e). Consider the convex 
function 

g(a, A) = E[<p(c + a(c- c) + Aw)]. 

This function is finite for A = and a in the interval (— Tj, 1], for some r\ > 0. 
Moreover, g(l,X) is finite for every A G (— e,s). By the convexity of g. we 
have that g is finite in the convex hull of the set 1 x (— e,e) U (— rj, 1] x 0. We 
obtain that <?(0, A) is finite for every A in some neighborhood of 0. Consequently, 
w G Lf. Since w £ Lf is arbitrary, the inclusion L~ C Lf follows. □ 

Lemma 5. // the function u is in K.f and we denote c — c + u — ip(u)uo, then 
the spaces Lf and L~ are equal as sets. 

Proof. The inclusion L~ C Lf follows from Proposition SJ Since u £ JCf , we 
have 

E[tp(c + Am)] < E[(p(c + (1 + A)it)] < oo, 

for every A in a neighborhood of 0. Thus c — c = — u + ip(u)ua belongs to L~. 
From Proposition |U we obtain L~ C Lf . □ 

By Lemma [SJ if we denote c\ + u — ipi(u)uo = c — C2 + v — ^2(^)1x0, we have 
that the spaces Lf . L~ and Lf 2 are equal as sets. In j7]), the function w is in 
Lf 2 and consequently ci — C2 is in Lf 2 . Therefore, the transition map ip~ 2 o ip 
is of class C°° . 

Since tp" 1 o (p ci is of class C°°, the set yj" 1 ^ n J£) is open B£. The 
^-families F£ are maximal in the sense that if two (^-families F% and Tf 2 have 
non-empty intersection, then they coincide. 

Lemma 6. For a function u in Bf, denote c = c + u — ^(u)uq. Then J-^f = J-~ . 

Proof. Let v be a function in Bf. Then there exists e > such that, for every 
A £ {— e, 1 + e), the function ip(c + Xv + (1 — X)u) is /x-integrable. Consequently, 
f{c + X(v — u)) is //-integrable for all A £ (— e, 1 + e). Thus the difference v — u 
is in /C~ and 

E\(v - u)<p'(c\] 

belongs to B~. Let -0: ,B~ — > [0, 00) be the normalizing function associated to 
c. Then the probability density tp(c + w — ip(w)uo) is m This probability 
density can be expressed as ip(c + v — ku ) for a constant k. According to 
Proposition [3J there exists a unique ip(u) £ R such that the probability density 
ip(c + v - ip(v)u ) is in Tf . Therefore, Tf C Jf. 

Using the same arguments as in the previous paragraph, we obtain that 
c = c + io — ijj(w)uo, where the function 10 G B~ is given in © with v — 0. Thus 
7? c 7?. ' □ 
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By Lemma[51 if we denote c\ + u — ipi(u)uo = c = c 2 + v — ip2(v)uo, then we 
have the equality Tf x = J% = Tf r 

The results obtained in these lemmas are summarized in the next Proposi- 
tion. 

Proposition 7. Let c l7 C2 : T — > R be measurable functions such that the prob- 
ability densities pi = f{c\) and p 2 = ¥'( c 2) are i n "Pp- Suppose J-f t fl J~¥ 2 7^ 0. 
TTien i/ie Musielak-Orlicz spaces L% and L*£ 2 are equal as sets, and have equiv- 
alent norms. Moreover, Tf = 3~f 2 - 

The collection {(Bf, V 5 c)}v(c)e'Pn satisfies (bml)-(bm2), equipping with 
a C°°-differentiable structure. 



5. Divergence 

In this section we define the divergence between two probability distribu- 
tions. The entities found in Information Geometry like the Fisher infor- 
mation, connections, geodesies, etc., are all derived from the divergence taken in 
the considered family. The divergence we will found is the Bregman divergence 



15| associated to the normalizing function ip: K% — > [0, 00). We show that our 
divergence does not depend on the parametrization of the (^-family Tf . 

Let S be a convex subset of a Banach space X. Given a convex function 
/ : S — > R, the Bregman divergence Bf : S x S — > [0, 00) is defined as 

B f (y,x) = f(y) - f(x) - d+f(x)(y - x), 

for all x,y £ S, where d + f(x)(h) = \im t ± (f(x + th) — f{x))/t denotes the 
right- directional derivative of / at x in the direction of h. The right-directional 
derivative d+f(x)(h) exists and defines a sublinear functional. If the function / 
is strictly convex, the divergence satisfies Bf(y,x) =0 if and only if x = y. 

Let X and Y be Banach spaces, and U C X be an open set. A function 
f:U — s- Y is said to be Gateaux- differ entiahle at xq £ U if there exists a 
bounded linear map A : X — > Y such that 

limi||/(a;o+*/i)-/(a! )-^i||=0, 

for every h £ X . The Gateaux derivative of / at xq is denoted by A = df(xo). If 
the limit above can be taken uniformly for every h £ X such that \\h\\ < 1, then 
the function / is said to be Frechet- differentiate at Xq. The Frechet derivative 
of / at Xq is denoted by A = Df(xo). 

Now we verify that ip : ICf — > R is a convex function. Take any u, v £ K.f 
such that u v. Clearly, the function Xu + (1 — X)v is in JCf, for any A £ (0, 1). 
By the convexity of ip(t, •), we can write 

E[(p{c + Xu + (1 - X)v - Xip{u)u - (1 - X)ip(v)u )] 

< XE[ip(c + u- tp(u)u )} + (1 - A) E[(p(c + v- ip(v)u )} = 1. 
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Since <p(c + Xu + (1 — X)v — t/j(Xu + (1 — X)v)uq) has /i-integral equal to 1, we 
can conclude that the following inequality holds: 

i/](\u + (1 - X)v) < XiJj(u) + (1 - X)tp{v). 

So we can define the Bregman divergence from to the normalizing function 

The Bregman divergence : Bf x Bf — > [0, oo) associated to the normal- 
izing function ip : Bf — > [0, oo) is given by 

Bi/)(v, u) — ip(v) — ip(u) — d+ip(u)(v — u). 

Then we define the divergence : Bf x Bf — > [0, oo) related to the <p- family 
Ff as 

D^(u,v) = B^(v,u). 

The entries of B^, are inverted in order that corresponds in some way to 
the Kullback-Leibler divergence Dkl{p,q) = E[plog(^)]. Assuming that (p(t, •) 
is continuously differentiable (or strictly convex), we will find an expression for 
dijj{u). 

Lemma 8. Assume that ip(t, •) is continuously differentiable. For any u G /CJf, 
the linear functional f u : L% — > M given by f u (v) — E[vip'(c + u)] is bounded. 

Proof. Every function v £ Ljf with norm ||v||$,o < 1 satisfies i$(i») < Ultimo- 
Then we obtain 

E[ip(c+\v\)} = !»(«) + E[ip(c)] <2. 
Since u € JCf, we can find A £ (0, 1) such that E[(p(c + ju)] < oo. We can write 

(1 - A) E[|tV(c + u)] < E[<p{c + u + (1 - A)H)] - E[tp(c + u)} 

= E[y(A(c + {u) + (1 - A)(c + \v\))} - E[ V (c + u)] 
< AE[^(c + {u)\ + (1 - A) E[(p(c + \v\)} - E[(p(c + u)}. 

Thus the absolute value of f u (v) = E[vcp'(c + u)} is bounded by some constant 
for IMko < 1- □ 

Lemma 9. Assume that tp(t, •) is continuously differentiable. Then the normal- 
izing function ip : K.f — > K is Gateaux- differentiable and 

d^u)v= ^^ + U -^ o) l (9) 
v ; E[u <p'(c + u-ip(u)uo)} 

Proof. According to Lemma [SJ the expression in © defines a bounded linear 
functional. Fix functions u € JC£ and v € L*£. In virtue of Proposition 31 we 
can find e > such that E[(p(c + u + X\v\)] < oo, for every A E [— e, e]. Define 

<7(A, k) = E[ip{c + u + Xv — ku )], 
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for any A e (— e, e) and fc > 0. Since K,% is open, there exist a sufficiently small 
o?o > such that u + Xv + a\v\ is in Kf for all a € [— ao, (Xq]. We can write 



,g(A + g, fc) - g(A, fc) _ E 



— {yj(c + u + (A + a)w — fcito) ^ <£>(c + ?i + Xv — kuo)} 
a 



The function in the expectation above is dominated by the /x-integrable function 
■^{(p(c + u + Xv + ao\v\ — kuo) — <p{c + u + Xv — kuo)}. By the Dominated 
Convergence Theorem, 



E 



— {ip(c + u + (A + a)v — kuo) — tp(c + u + Xv — fcuo)} 
a 



— > E[vip'(c + u + Xv — ku )], as a — > 0, 

and, consequently, 

|f(A, fc) = E[v<p'(c + u + Xv - kuo)]. 
uX 

Since vip'(c + u + Xv — kuo) is dominated by the /i-integrable function |w|<^'(c + 
u + s\v\ — fcuo), we obtain for any sequence A„ — > A, 

M[v(p (c + u + X n v — kuo)] — > E[vip' (c + u + Xv — kuo)], as n — > oo. 

Thus §^(A, k) is continuous with respect to A. Analogously, it can be shown 
that 

|f(A,fc) = -E[uo<p'(c + u + Xv-ku )], 
ok 

and §f (A, fc) is continuous with respect to fc. The equality g(X, fc(A)) = E[<£>(c + 
u + Xv — k(X)uo)] — 1 defines fc(A) = ip(u + Xv) as an implicit function of 
A. Notice that 8g g°' fc - > < 0. By the Implicit Function Theorem, the function 
fc(A) = %jj{u + Xv) is continuously differentiable in a neighborhood of 0, and has 
derivative 

dk (dg/dx)(o,k(o)) 

dX { ' {dg/dk)(0,k(0))' 

Consequently, 

djjju + Xv) E[v(p'(c + u-ip(u)u )] 

dip(u){v) = — 0) = — — — — 

oX ltpoV ( c + u ~ y{u)uo)\ 

Thus the expression in §§§ is the Gateaux-derivative of ip. □ 

Lemma 10. Assume that ip(t, ■) is continuously differentiable. Then the diver- 
gence does not depend on the parametrization of . 



Proof. For any w £ Bf, we denote c = c + w — ip{w)uo. Given u, v £ B%, select 
u, v e B~ such that <p^(u) = f c (u) and (fc(v) — <p c (v). Let tp: B~ — > [0, oo) be 
the normalizing function associated to c. These definitions provide 

c + u — ip(u)uo = c + u — %jj{u)UQ, 
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and 

c + v — tp(v)uo = c + v — ip(v)UQ. 
Subtracting these equations, we obtain 

[— ij){v) + tp{u)]u + (v — u) = [— ip(v) + ip(u)]u + (v — u) 

and, consequently, 



ip(v) — ip(u) 



E[(v — u)(f'(c + u — ip(u)uo)] 
E[uQ(p'(c + u — ip(u)uo)] 

= ip(v) — ip( u ) 



E[(v — u)(f'(c + u~ -0(u)uo)] 
E[u (f'(c + u — tp(u)u )] 



Therefore, Dr(u,v) = D^(u,v). 



□ 



We denote the divergence 



Let p = <p c (u) and q = <p c (v), for u,v € Bf. 
between the probability densities p and q by 

Dip II l) = D^{u,v). 

According to Lemma [HJJ D{p || q) is well-defined if p and q are in the same 
^-family. We will find an expression for D(p \\ q) where p and q are given 
explicitly. For u = 0, we have D(p \\ q) ~ I?^(0, v) = ip{v), and then 



Dip || q) 



E[(-v + i>(v)u )<p'(c)] 



E[u ip'{c)] 

Therefore, the divergence between probability densities p and q in the same 
^-family can be expressed as 



E 



Dip || q) 



(<p-l)>(p) 



E 



Uq 



if- l )'iv) 



(10) 



Clearly, the expectation in (|10[) may not be defined if p and q are not in the same 
(p- family. We extend the divergence in (|10|) by setting D(p \\ q) — oo if p and 
q are not in the same (^-family. With this extension, the divergence is denoted 
by D v and is called the ip- divergence. By the strict convexity of <p(f, •), we have 
the inequality t/3~ 1 (t,u) — t^ -1 (i, v) > (ip^ 1 )' (t,u)(u — v) for any u,v > 0, with 
equality if and only if u = v. Hence D v is always non- negative, and D ip (p \\ q) 
is equal to zero if and only if p = q. 

Example 11. With the variable K-exponential exp K (t, u) — exp K ^(u) in the 
place of </?(i,it), whose inverse c/? -1 (i,it) is the variable ^logarithm hi K (t,u) — 
ln K ( 4 )(w), we rewrite (fit)]) as 

ln K (p) - ln K (q)~ 



E 



Dip || q) = 



ln 'niP) 



E 



i/ 



(11) 



Kb). 
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where ln K (p) denotes \n K ^(p(t)). Since the K-logarithm ln K (u) = - 2 " — has 
derivative hi K (u) = ^ u + 2 " — , the numerator and denominator in (jlip result in 



E 



ln K (p) - ln K (q) 



= E 



2 k 



2 k 



1 p K + p K 
P 2 



-Er. 



pHp p K + p 



and 



E 



«o 



= E, 



2u 



respectively. Thus pT|) can be rewritten as 



D K (p\\q) 



p n — p 



E, 



2tt 



pft, _|_ p 



which we called the n-divergence. 
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